Analytical maximum-likelihood method to detect patterns in real networks 
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In order to detect patterns in real networks, randomized grapii ensembles that preserve only 
part of the topology of an observed network are systematically used as fundamental null models. 
However, their generation is still problematic. The existing approaches are either computationally 
demanding and beyond analytic control, or analytically accessible but highly approximate. Here we 
propose a solution to this long-standing problem by introducing a fast method that allows to obtain 
expectation values and standard deviations of any topological property analytically, for any binary, 
weighted, directed or undirected network. Remarkably, the time required to obtain the expectation 
value of any property analytically across the entire graph ensemble is as short as that required to 
compute the same property using the adjacency matrix of the single, original network. Our method 
reveals that the null behavior of various correlation properties is different from what previously 
believed, and highly sensitive to the particular network considered. Moreover, our approach shows 
that important structural properties (such as the modularity used in community detection problems) 
are currently based on incorrect expressions, and provides the exact quantities that should replace 
them. 

PACS numbers: Valid PACS appear here 



I. INTRODUCTION 

Detecting relevant patterns in real networks, a funda- 
mental problem for many research fields [Ij'iSj , relies upon 
the possibility to distinguish the properties explained by 
the presence of simple constraints from more complex 
and nontrivial structural features. For this reason, statis- 
tical ensembles of graphs with specified constraints, and 
otherwise completely random, have been introduced and 
systematically used as a reference to identify non-random 
patterns in a real network [H426|. Such ensembles serve 
also as powerful models to study dynamical processes on 
networks displaying only a set of desired properties, and 
allow to highlight the dynamical effect of each property 
separately. The simplest and most important ensembles 
specify only local constraints. For unweighted networks, 
this amounts to specify the degree ki (number of incident 
edges) of each vertex (i = 1 . . . N where N is the total 
number of vertices), and results in the so-called config- 
uration model [H m [7] • In the weighted case, the cor- 
responding constraint is obtained by fixing the strength 
Si (sum of incident edge weights) of each vertex fT5l417j . 
More in general, one could enforce different or additional 
properties HH [13 [H [H [M23] • 

Unfortunately, as we discuss in detail in what follows, 
it turns out that even in the simplest case with local con- 
straints, the correct generation of random ensembles cor- 
responding to a particular real world network is problem- 
atic. Both analytical and computational approaches pro- 
posed so far have severe limitations. Motivated by this, 
here we propose a new maximum-entropy method that 



is entirely analytical and does not require the generation 
of randomized variants of a real network. Our method 
provides the exact probabilities of occurrence of random 
graphs with the same (average) constraints as the real 
network, from which the expectation values and standard 
deviations (and in principle the higher moments) of any 
topological quantity of interest can be calculated mathe- 
matically, either exactly or using proper approximations. 
Due to its analytical character, our method is extremely 
faster than all the available alternatives. Moreover, it can 
be applied to undirected, directed, binary and weighted 
networks in a unified way. We will illustrate the power 
of our approach on several real-world networks of differ- 
ent nature and type, by studying a range of topological 
properties of interest. 



II. AVAILABLE METHODS AND THEIR 
LIMITATIONS 

We first briefly review the existing problems in the case 
of binary unweighted networks, which is the most fre- 
quently explored situation. A binary unweighted graph 
with N vertices is completely specified by a x TV ad- 
jacency matrix A with entries atj = 1 if the vertices i 
and j are connected, and atj = otherwise. Generally, 
one is interested in comparing the observed topological 
properties of a particular real-world network A* against 
the average properties of a randomized family of net- 
works with the same degree sequence k{A*) — {ki{A*)}, 
where ki{A*) — a*^ is the degree (number of connec- 
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FIG. 1. An illustration of the local rewiring algorithm whose 
iteration allows to computationally explore the microcanoni- 
cal configuration model. 

tions) of vertex i in the network A*. The ensemble of bi- 
nary undirected networks with specified degree sequence 
is known as the configuration model (CM) 51 13 13 ^'^^ 
is currently treated in two very different ways: computa- 
tionally, by explicitly generating many random networks 
with the desired degree sequence and averaging the quan- 
tities of interest across the randomized networks [U [S] , 
or analytically, by using approximations that allow to di- 
rectly estimate the average of topological properties as 
a function of the enforced degree sequence, without ac- 
tually measuring them on any network [7| fS] . Currently, 
both approaches suffer from severe limitations. 

A 'bottom-up' computational approach consists in as- 
signing each vertex i a number of 'edge stubs' equal to 
its observed degree ki{A.*), and randomly matching pairs 
of stubs (avoiding self-loops and multiple links) until all 
degrees reach their desired values {edge stub reconnec- 
tion). However, this procedure is known to get stuck in 
configurations where vertices requiring additional con- 
nections have no more eligible partners [4j [5]. As a 
consequence, one must implement a 'top-down' compu- 
tational approach where the entire real network A* is 
taken as the initial configuration, and a family of ran- 
domized variants is generated by iteratively applying a 
local rewiring algorithm (LRA) where two edges {A, B) 
and (C, D) are randomly selected and replaced by the 
two edges {A,D) and {C,B), if the latter are both not 
already present |4j [5] (see figjl] for an illustration) . This 
generates a microcanonical ensemble (see the Appendix 
for a detailed discussion) where all randomized networks 
have exactly the same degree sequence as the original 
network, and are sampled with equal probability. This 
method has been applied to various networks, including 
the Internet cellular networks [6j and food webs (12j . 
in order to detect higher-order patterns (such as clus- 
tering and motifs) not merely due to local constraints. 
However, this approach is time-consuming since many 
(a number R much larger than the observed number of 
links L [H [21] , even if not rigorously specified) iterations 
of the LRA are required to obtain a single randomized 
network, and the entire process must be repeated several 
times to produce a large number M (again unspecified) 



of randomized networks, on each of which any topolog- 
ical property X of interest must be measured explicitly 
and averaged at the end to obtain an estimate for {X). 
The computational time required to obtain {X) is there- 
fore of the order 0{M -Tr- R) + 0{M -Tx), where Tr is 
the average time required to perform a single successful 
rewiring step and Tx is that required to compute X on 
a single network in the randomized set. Moreover, even 
if the sufficient statistics of the problem is the degree 
sequence k(A*) alone, the above approach requires the 
entire original network A* (or any other network with 
the same degree sequence, which is however difhcult to 
obtain from scratch due to the problems discussed above) 
as the starting configuration, thus making use of much 
more information than required in principle. 

By contrast, analytical approaches seek to provide the- 
oretical expressions to directly obtain the ensemble av- 
erages of topological properties, without generating the 
ensemble computationally. Two main approaches ex- 
ist. One makes use of generating functions for the rel- 
evant probability distributions. In the case we are dis- 
cussing here, the key quantity is the generating function 
g{z) — '^t^z''P{k) of the degree distribution [7]. Unfor- 
tunately, this method assumes that the network is infinite 
and locally tree-like (even if in some cases this approxi- 
mation turns out to perform unexpectedly well even be- 
yond its formal range of applicability [17]), and is thus 
inappropriate if the size of the network is small and if the 
input degree distribution can only be realized by dense 
and/or clustered networks. In this approach, clustered 
or dense networks can only be generated by imposing 
additional constraints besides the degree sequence, such 
as the number of triangles attached to vertices [28] , thus 
leading to a different ensemble which is not the one we 
are seeking to characterize. 

A different approach looks for an analytical expres- 
sion for the probability pij that the vertices i and j 
are connected in the randomized ensemble [5]. Due 
to its probabilistic nature, this approach generates a 
(grand) canonical ensemble where even graphs violating 
the constraint are present and assigned different proba- 
bilities. In such a case, the constraints are realized on 
average, i.e. the expectation value {X) of any specified 
property X is fixed exactly (see Appendix). While this 
approach is indeed very fast in providing averages of the 
desired properties, it has been shown [5] that it makes 
use of a highly approximate expression for pij , valid only 
when the original network is sparse and/or the degree 
distribution is not too broad. This expression is 

_ fc.(A*)fc,(A*) 

where L* ^ L(A*) = E^hiA*)/2 = E.<, is 
the total number of links. While the expected degree 
(ki) — J2jPij generated by the above formula coincides 
with the desired degree ki{A*), the probability pij may 
exceed 1 for pairs of highly connected nodes such that 
ki{A*)kj{A*) > 2L(A*). In general, only if the degree 
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sequence is such that 



fc,(A*)<y2Z(A^=^^fc,(A*) Vz (2) 

then using cq.Q on the real network A* will not lead 
to the above problem. While the above condition is typi- 
cally obeyed by networks with narrow degree distribution 
such as the Erdos-Renyi random graph, it is generally vi- 
olated by scale-free networks displaying a power-law de- 
gree distribution P{k) ^ k^'^ , and this violation becomes 
stronger and stronger as the density of the network in- 
creases. In particular, it is possible to show that in order 
to ensure eq.([2]) the maximum degree kmax in the net- 
work should not exceed the so-called structural cut-off 
kc ~ 7V^/^ [29]. This is particularly evident for dense 
networks where the average degree k = J^i = 2L/N 
remains constant as N increases, so that eq.([2]) remains 
valid only if k^ax < ■\/2L ^ iV^/^. By contrast, ex- 
treme value theory shows that in networks with degree 
distribution P(fc) ^ k~^ the maximum degree scales 
as kmax A^i/(^^i), so that if 7 < 3 (as observed in 
most real-world scale-free networks) then kmax > N^^"^ 
which exceeds fee- The meaning oi pja being larger than 
1 for some pairs of vertices in eq.Q is that, in order 
to actually realize the degree sequence of the real net- 
work A*, one must let i and j be connected by more 
than one undirected edge. Also, since the desired equal- 
ity {ki) — ki{A*) is only ensured if one lets the sum 
in pij = (ki) run over all vertices including i itself, 
one must allow the presence of self-loops in the random- 
ized networks. Thus, even if this is not evident at a 
first glance, the ensemble generated by eq.Q does not 
only contain binary and loop-less undirected graphs and 
is thus not a proper null model for an empirical binary 
loop-less network A* with degree sequence k{A*) violat- 
ing eq.([2]), as is typically the case for real- world networks 
with broad degree distributions. 

An elegant proof that the correct ensemble probabil- 
ity Pij for loop-less graphs with no multiple connections 
differs from eq.Q has been proposed [9 and re-derived 
within the framework of maximum-entropy graph ensem- 
bles [H]. We shall exploit this result to obtain an exact 
method later on. We will also show that in real net- 
works the deviation is stronger than expected, and af- 
fects sparse networks as well. An independent proof of 
the inadequacy of eq.Q is that it does not generate the 
graph A* with maximum likelihood |30j . This can be 
confirmed by treating i as a free parameter and look for 
its value L*^^ that maximizes the probability to obtain 
A*. One finds that L*^^ ^ L{A*), which implies that 
under the maximum likelihood choice (ki) 7^ fci(A*) and 
(L) ^ L(A*), violating the desired constraint on the de- 
gree sequence and the implied one on the number of links 
[30] . This shows that the functional form of pij in eq. ([T]) 
is intrinsically problematic and does not give highest like- 
lihood to A* and to all other graphs with the same degree 
sequence as A*. 



Therefore, while the available analytical methods are 
useful to characterise artificially generated networks with 
special properties, they cannot be used to correctly ran- 
domise any real- world network which is either small, clus- 
tered, or dense. Unfortunately, the above limitations are 
generally ignored, and eq.([T]) is frequently used beyond 
its limits of applicability to estimate connection proba- 
bilities. Moreover, as we note later on, it is also used as 
a key ingredient in order to define important structural 
properties which implicitly rely on a comparison against 
the CM. Analogous problems exist in the analysis of di- 
rected and/or weighted networks. We will consider each 
of these cases separately in what follows. 



III. A FAST AND ANALYTICAL METHOD 

The above discussion highlights that no method de- 
veloped so far succeeds in obtaining randomized prop- 
erties of a particular real-world network such that two 
requests are met simultaneously: i) the method is gen- 
eral and works for any network, even if displaying small 
size, high link density, and large clustering; ii) expected 
values across the ensemble can be computed analytically, 
without sampling the configuration space explicitly. The 
need to resort to the LRA as the only statistically cor- 
rect method available, which however requires the artifi- 
cial generation of many randomized networks, makes the 
general problem very complicated and all its applications 
time consuming. 

In this paper we propose a solution to this long- 
standing problem. We develop an approach that com- 
bines exact expressions for the occurrence probabilities 
of graphs in maximum-entropy ensembles with given con- 
straints [U [m [TU [2TH23] with more recent results about 
the application of the Maximum Likelihood principle to 
graph ensembles [3D]. In the Appendix we describe our 
method in great detail. We start with a general dis- 
cussion which is formally valid for any constraint, and 
then consider explicitly the application to real networks 
where a set of local constraints is enforced. We con- 
sider the cases of binary, weighted, directed and undi- 
rected networks separately. We show that in all these 
cases the enforcement of local constraints always leads to 
exact probabilities that can be easily obtained analyti- 
cally. Then we also consider an extension to non-local 
constraints which can still be dealt with analytically. Fi- 
nally, we compare our (grand)canonical method with the 
corresponding microcanonical ensemble generated com- 
putationally as in the LRA. 

As we show, in all the cases of interest a choice of 
constraints leads to a specific set of coupled nonlinear 
equations to be solved. In such equations, the observed 
values of the enforced topological properties (e.g. the 
degree sequence) determine the values of an equal num- 
ber of 'hidden' parameters in such a way that the real 
network, or any other network with the same constraints 
as the real one, is generated with maximum likelihood. 
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Since only the enforced constraints enter the equations, 
our method only requires the knowledge of the sufficient 
statistics of the problem and not of the whole topology, 
restoring a desirable feature of randomization algorithms. 
Solving the maximum-likelihood equations only takes a 
computational time Te which is negligible if compared to 
the time required to measure any nontrivial topological 
property, and entirely replaces the artificial generation of 
many randomized variants of the original network. 

Once the parameters solving the equations are found, 
they can be directly used to obtain the expectation value 
{X) and standard deviation a[X] of any topological prop- 
erty X of interest analytically. When useful, this also 
allows to obtain a z-score representing the number of 
standard deviations by which the randomized value (X) 
differs form the observed value X{A*). The possibility 
to obtain the standard deviations and/or z-scores is very 
important, because it allows to assess which topological 
properties X are consistent with their randomized value 
{X) within a statistical error, and which deviate signifi- 
cantly from the null expectation. In the former case, one 
can conclude that the enforced constraints completely 
explain the higher-order property X. In the latter case, 
the observed property cannot be traced back to the con- 
straints, and therefore requires additional explanations 
or generating mechanisms besides those required in or- 
der to explain the constraints themselves (it should be 
noted, however, that z-scores can be unambiguously in- 
terpreted only if the property X is normally distributed, 
and this is generally not the case; nonetheless they still 
carry information about the discrepancy between obser- 
vations and the null model). 

Importantly, the time required to compute the expec- 
tation value {X) of a given property X analytically (for- 
mally corresponding to an average over a huge num- 
ber of randomized configurations) is the same as the 
time Tx required to compute the same property on the 
single original network. Therefore our method takes 
only a total time 0{Te + Tx) to obtain {X) exactly, 
which is incredibly shorter than the aforementioned time 
0{M ■ Tn ■ R) + 0[M ■ Tx) required by the LRA to 
obtain {X) approximately. Importantly, Te is indepen- 
dent of the complexity of the topological property X to 
measure, which means that for complicated properties 
0{Te +Tx) = 0{Tx). Therefore for any topological 
property X which can be measured in a large but still 
reasonable time 0{Tx) on the real network, the compu- 
tation of its expectation value {X) will require the same 
time 0(Tx)- If the time required in order to obtain {X) 
is too large, it is because the time required to measure 
X is too large as well. In other words, the property 
X is too complicated to be computed on the real net- 
work itself. In such a case, the problem is not due to 
the method, but to a demanding choice of X for that 
particular network. Note that we are assuming that the 
topological properties of the real network are computed 
using the full adjacency matrix. This is the worst-case 
scenario, since in many cases (especially for sparse net- 



works) it is enough to use reduced information such as 
the list of existing links. For instance, the time to mea- 
sure the clustering coefficient can be significantly shorter, 
using optimized algorithms, on a sparse network than on 
a generic network of the same size (an in this case it 
will also be shorter than the time required to compute 
its randomized value across our ensemble). However, our 
interest is precisely to focus on the (worst) general case 
(e.g. dense and very dense networks), because it is in this 
case that other approaches fail (such as eq. II]) , or become 
extremely time consuming (such as the LRA^ which takes 
longer for denser graphs). 



IV. RESULTS 

We now show the application of our method to real 
networks of various type, by considering several topolog- 
ical properties and their randomized counterparts. 



A. Binary undirected networks 

We start with the simplest case of binary undirected 
networks. One of the most important topological proper- 
ties of a binary network is the correlation between the de- 
grees of adjacent nodes, which has been shown to dramat- 
ically affect various structural and dynamical features [2] . 
These correlations can be measured by the average near- 
est neighbour degree (ANND), which on the real network 
A* is defined as 



fc™(A*) 



Y^j^i J2k^j "'h"'jk 



(3) 



While the degree is a first-order property which only de- 
pends on the number of links (topological paths of length 
one) entering a vertex, the ANND is a second-order prop- 
erty contributed by paths of length 2 (i.e. the terms 
a*ija*jk)- Similarly, a third-order (i.e. involving paths of 
length 3) property is the clustering coefficient c^, which 
represents the fraction of pairs of neighbours of vertex i 
which are mutually connected: 



c,(A*) = 



(4) 



As we mentioned, it is always important to assess 
whether in a particular real network higher-order proper- 
ties arise merely as a consequence of low-level constraints 
or whether they signal additional structural patterns. In 
particular, comparing the real network A* with the CM 
(which provides an ensemble of random networks having, 
on average, the same degree sequence A: (A*) as A*) al- 
lows to assess whether longer topological paths and the 
structural properties involving them are simply a ran- 
dom concatenation of the individual links enforced by 
the degree sequence, or whether they are irreducible to 
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FIG. 2. Application of our method to binary undirected net- 
works. The red points are the empirical data, the black solid 
curves are averages over the configuration model obtained us- 
ing the local rewiring algorithm ^5^, and the blue dashed 
curves are the analytical expectations (± one standard de- 
viation) obtained using our method. The green curves are 
the flat expectations under the Erdos-Renyi random graph 
model, and highlight the average level of correlation in the 
random case. The panels report k"" versus ki (left) and d 
versus ki (right) for: a) and b) the network of the largest US 
airports (TV — 500) [31], c) and d) the synaptic network of 
Caenorhabditis elegans (N — 264) [32], e) and f) the protein- 
protein interaction network of Helicobacter pylori {N = 732) 
|33| , g) and h) the network of liquidity reserves exchanges be- 
tween Itahan banks in 1999 [34] (TV = 215), i) the Internet 
at the AS level (A^ — 11.174) [55] and j) the protein-protein 
interaction network of Saccharomices cerevisiae {N = 4.142) 
|33| . The last two networks are randomized using only our 
method, as the local rewiring algorithm would require much 
more time given the large number of edges. 



first-order constraints. As we discuss in detail in the Ap- 
pendix, our method can solve this problem by making 
use of an auxiliary A^-dimensional vector x = {xi . . .x^} 
of parameters. In particular, one must look for the par- 
ticular values X* that solve the following set of N coupled 
nonlinear equations: 



. , . I + x*x* 



(5) 



where ki{A*) is the observed degree of vertex i in the 
real network A*. Once the parameter values are found, 
they allow to obtain analytically the expectation value 
{X)* of any topological property X across the desired 
ensemble. This simply amounts to replace the adjacency 
matrix entry a*^ appearing in the definition of X{A*) 
with its expectation value 



X, X, 



l + x*x* 



(6) 



which represents the correct expression that should be 
used in place of eq.Q. Similarly, it is possible to obtain 
the standard deviation a* [X] analytically in terms of x* 
(see Appendix). 

In fig(2] we show an application of our method on the 
network of the 500 largest US airports [3T], a synap- 
tic network ^32^, two protein interaction networks |33) . 
an interbank network [34) and the Internet at the Au- 
tonomous Systems level |35j . These are among the most 
studied networks of this type. We compare the correla- 
tion structure of the original networks, as measured by 
the dependence of fcj""(A*) and Ci{A*) on fci(A*), with 
the expected values (fcf")* and (q)* obtained analyti- 
cally using our method. Note that we are averaging the 
values of fc""(A*) and Ci{A*) over all vertices with the 
same degree: this makes our comparison with the val- 
ues (fc"")* and (ci)* consistent, since both real and ran- 
domized quantities can be plotted using the same values 
{ki}* — ki{A*) on the abscissa (we use the same strat- 
egy in what follows) . We also highlight the region within 
one standard deviation around the average by plotting 
the curves (fcf")* ± cr*[fc,""] and (q)* ± a*[ci\. For the 
sake of comparison, we also report the average values 
obtained sampling the microcanonical ensemble with the 
standard local rewiring algorithm @1[S], and the expected 
values over the ensemble of random graphs with the same 
number of links (random graph model, RG) As we men- 
tioned, the microcanonical method requires the gener- 
ation of many randomized variants, many rewirings per 
variant, and the measurement of fc"" and Ci on each vari- 
ant separately, plus a final averaging. By contrast, our 
method only requires the preliminary estimation of the 
{x*}. Then the calculation of (fc"") and (c^) takes ex- 
actly the same time as that of the empirical values. As 
can be seen, the two approaches yield very similar re- 
sults (in the Appendix we provide a detailed comparison 
of the two methods). For the two largest networks (the 
protein interactions in S. cerevisiae and the Internet), we 
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only report the expectations obtained using our method, 
as the microcanonical approach would require too much 
computing time. 

The above results allow to interpret the effect of the 
degree sequence on higher-order properties. Firstly, the 
trends displayed by the CM are not fiat as those expected 
in the random graph case. This confirms that residual 
structural correlations, simply due to the enforced con- 
straint, are still present after the rewiring has taken place. 
The presence of these correlations does not require any 
additional explanation besides the existence of the con- 
straints themselves. This is very different from the pic- 
ture one would get by using the (wrong) expectation of 
eq.([T]) which would yield flat trends as well, naively sug- 
gesting that correlations can never be traced back to the 
degree sequence alone. Secondly, while the trends ob- 
served in all the networks considered are always decreas- 
ing, they unveil different correlation patterns when com- 
pared to the randomized trends. The real interbank data 
are almost indistinguishable from the randomized curves, 
meaning that structural constraints can fully explain the 
observed behaviour of higher-order network properties. 
Instead, in the airport network the randomized curves 
lie below the real data (except for an opposite trend of 
(A:"") for low degrees). This means that the real net- 
work is more correlated than the baseline randomized 
expectation, and indicates that additional mechanisms 
producing positive correlations must be present on top 
of structural effects. By contrast, in the H. pylorVs pro- 
tein network the expected curves lie above the real data, 
suggesting the presence of mechanisms producing nega- 
tive correlations. The same is true for the correlation 
structure of the Internet, confirming previous results [S], 
while S. cerevisiae^s protein network is completely differ- 
ent from its randomized variants. Therefore seemingly 
similar trends can actually reveal very different types of 
structural organization. This means that measuring the 
topological properties alone is uninformative, and makes 
the comparison between real data and randomized en- 
sembles essential. Thus the possibility to analytically 
and quickly characterize the latter, which was previously 
unavailable, is a remarkable advantage of our approach. 



B. Directed networks 

We now consider binary directed networks, which 
are specified by an asymmetric adjacency matrix A. 
The local constraints are now represented by the joint 
sequence of out-degrees and in-degrees = 
j^z J '^j^i ^ij } ■ Given a particular real network A* 
and a measured topological property X(A*), our method 
allows to analytically obtain the expectation value {X) * 
and standard deviation a* [X] across the ensemble of bi- 
nary directed graphs with, on average, the same directed 
degree sequences A:°"*(A*) and fc™(A*) as A* [directed 
configuration models DCM). As shown in the Appendix, 
in this case our method makes use of two A^-dimensional 
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FIG. 3. Application of our method to directed networks. Red 
points are the empirical data, the black solid curves are ex- 
pectations under the directed configuration model using the 
local rewiring algorithm, and the blue dashed curves are the 
exact expectations obtained using our method (± one stan- 
dard deviation). The green curves are the flat expectations 
under the directed version of the Erdos-Renyi random graph 
model. The panels report fcf"''" versus (left) and fc™'""' 
versus (right) for: a) and b) the directed neural net- 

work of Caenorhabdttis elegans (N = 264) [32], c) and d) the 
metabolic network of Escherichia coli (N = 1078) [36] , e) and 
f) the Ltttle Rock Lake food web {N = 183) [37]. For the C. el- 
egans network, we also show the microcanonical standard de- 
viations obtained using the LRA (black dotted curves), which 
are indistinguishable from the grandcanonical ones. 



vectors x, y of auxiliary variables, and requires that these 
parameters are set to the particular values x*, y* that 
solve the following set of 2N coupled nonlinear equations: 



^^^ = fc™*(A*) 



+ x*y* 



= fcr(A*) 



(7) 



(8) 



The quantities x* , y* allow to obtain {X)* and (T*[X] an- 
alytically and quickly, outperforming the directed version 
of the LRA [? ]. Note that, as in the undirected case, 
the method only makes use of the sufficient statistics of 
the problem. 

We apply our method to various directed networks, 
by studying the second-order topological properties mea- 
sured by the outward ANND and the inward ANND, 



7 



which are defined as two natural generahzations of eq. ^ : 

(9) 



j^nn.out 



Y^j^i Y^k^j "'ijO'jk 



'(A*) = ^'^^i 



E 



In figjs] we plot the observed values fc""'" 
/cf (A*) and A:f"'°"*(A*) versus /c™*(A*) 



(10) 

(A*) versus 
, as well as 



the expectations (fc^ 



± cr \k. 



and {k. 



nn,out\ ^ 



± 



tT*[fc""'°"*] obtained using our model (see Appendix), for 
three real directed networks: the neural network of C. 
elegans |32j (now in its directed version), the metabolic 
network of E. coli [36], and the Little Rock Lake food 
web |37] . As before, we also show the microcanonical av- 
erage obtained using the LRA and the expectation under 
the directed random graph model (DRG) with the same 
number of links. Again, we find a very good agreement 
between the two approaches, confirming that our method 
yields the correct prediction in incredibly shorter time 
(see Appendix for a discussion about the convergence 
time of the LRA to our exact results). For the C. elegans 
network (fig. [3^-b), we also show the microcanonical 
standard deviations, which turn out to be indistinguish- 
able from the grandcanonical ones. We also confirm that 
while some networks (C. elegans and E. coli) are almost 
consistent with the null model, others {Little Rock) de- 
viate significantly. 

However, the most interesting point for the present 
analysis is that, while for the undirected networks con- 
sidered above all randomized trends were decreasing, in 
this case we find that the three randomized trends be- 
have in totally different ways. In the neural network. 



both (fcj'"'™)* and {k": 



nn.out\ >^ 



are approximately con- 



stant. This means that the baseline behavior for both 
quantities is flat and uncorrelated (as in the directed 
random graph, but at a different level). By contrast, 
in the metabolic network the expected curves are de- 
creasing, and thus the ensemble of randomized networks 
is disassortative as for the undirected graphs considered 
above. Finally, in the food web the constraints enforce 
unusual positive correlations, and the randomized ensem- 
ble is even assortative. Interestingly, while it is expected 
that random networks with specified degrees display a 
disassortative behavior [51 [5] , the assortative trend is to- 
tally surprising. This is because our method extracts 
the hidden variables directly from the specific real world 
network, rather than drawing them from ad hoc distribu- 
tions. The resulting values can be distributed in a very 
complicated fashion, invalidating the results obtained un- 
der other hypotheses. To further highlight this important 
point, we selected three more food webs characterized by 
a particularly small size (see figQ. Small networks can- 
not be described by approximating the mass probability 
function of their topological properties (such as the de- 
gree) with a continuous probability density. Therefore 
in this case the difference between the expectations ob- 
tained by drawing the x and y values from analytically 
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FIG. 4. Application of our method to small-size directed food 
webs. Red points are the empirical data and the blue dashed 
curves are the exact expectations (± one standard deviation) 
under the directed configuration model obtained using our 
method. The green curves are the flat expectations under 
the directed version of the Erdos-Renyi random graph model. 
The panels report fc""'*" versus kl" (left) and versus 
fcf' (right) for: a) and b) the Narragansett Bay web (A'' = 35) 
[58] . c) and d) the Mondego Estuary web (N = 46) [38], e) 
and f) the St. Marks River web (N = 54) [3H]- For the 
latter, in g) and h) we also compare the empirical data with 
the expectations under the reciprocal configuration model, 
where also the number of reciprocated links of each vertex is 
specified. 



tractable continuous distributions and those obtained by 
solving eqs.([8]) using the empirical degrees is particularly 
evident. As we show in figjd] (where for simplicity we 
omit the comparison with the LRA), we confirm that 
the (directed) CM can display not only flat or decreasing 
trends, but also increasing ones. Importantly, in this case 
all three webs do not deviate dramatically from the null 
model. This means that while one would be tempted to 
interpret the three observed trends as signatures of dif- 
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ferent patterns (zero, negative and positive correlation), 
actually in all three cases the observed behavior can be 
roughly replicated by the same mechanism and almost 
entirely traced back to the degree sequence only. This un- 
expected result highlights once again that the measured 
values of any topological property are per se entirely un- 
informative, and can only be interpreted in relation to a 
null model. 



C. Reciprocity and motifs 

So far, in our analysis of directed networks we have 
considered second-order topological properties. In prin- 
ciple, third-order properties can be studied by introduc- 
ing directed generalizations of the clustering coefficient 
[39l |40] . However, there is a proliferation of possible 
third-order patterns due to the directionality of links. For 
this reason, a more complete analysis consists in count- 
ing (across the entire network) all the possible directed 
motifs [6 involving three vertices, and comparing the em- 
pirical abundances with the expected ones under the null 
model. As we show in a moment, our method lends itself 
admirably in such a case. Before presenting our results, 
we note however that directionality makes the possible 
specifications of the null model proliferate as well. In 
particular, besides the DCM considered above, a more 
refined way to randomize directed networks includes the 
possibility to enforce additional constraints on the reci- 
procity structure ,61, illj. In other words, it is possible 
(and important in many applications [SlIU]) to preserve 
not only the total numbers fc™ and fc°"* of incoming 
and outgoing links of each vertex, but also the number 
= aijdji of reciprocated links (pairs of links in 
both directions) [41] |42] . This specification is equivalent 
to enforce, for each vertex i, the three quantities [TTl BT] 
kj^ = ^Ylij^i '\j (number of non-reciprocated outgoing 
links), kX' = 'Ylij^i'^Tj (number of non-reciprocated in- 
coming links) and fcf* = X^j^i '^Tj (number of recipro- 
cated links), where = aij(\ — aji), = aji{l — Uij) 
and = a^jaji. 

Given a real directed network A* , we denote the null 
model with specified joint reciprocal degree sequences 
{k~^{A*),kX{A*),kl^{A*)} as the reciprocal configura- 
tion model (RCM). This is an example of model with 
nonlocal (second-order) constraints which can still be 
treated analytically using our method. As we show in 
the Appendix, in this case one must solve the following 
3A^ coupled equations: 



y - 

^ 1 + X*V* - 



+ x*y*+x*y*+z*z* 
+ x*y* +x*y* +z*z* 



5' l + < 



,y* + x*y* + z* z* 



-fcr(A*) 
= fcr(A*) 
= fcr(A*) 



Vi (11) 
Vi (12) 
Vi (13) 



The expectation value of any topological property, as well 
as its standard deviation, can now be calculated analyti- 
cally in terms of the three A''-dimensional vectors x* , y* , 
z* . For instance, in fig|4^-h we repeat the analysis of the 
directed ANND of the St. Marks River food web, now 
comparing the observed trend against the RCM. In this 
case, we find no significant difference with respect to the 
DCM considered above (fig|4j3-f). However, as we now 
show, the analysis of motifs reveals a dramatic difference 
between the predictions of the two null models. 

If denotes the number of occurrences of a particu- 
lar motif m, our method allows to calculate the expected 
number (Nm)* and standard deviation a*[Nm\ exactly 
(see Appendix), and thus to obtain the z-score 



j[A^m] = 



(14) 



analytically. This can be done for both the DCM and 
the RCM. The value of 2[Afm] indicates by how many 
standard deviations the observed and expected numbers 
of occurrences of motif m differ. Large values of 2;[7Vm] 
indicate motifs that are either over- or under-represented 
under the particular null model considered, and that are 
therefore not explained by the lower-order constraints en- 
forced. In fig[5]we show the z-scores for all the possible 
13 non-isomorphic connected motifs with three vertices 
in 8 real food webs, for both null models. We also show 
the two lines z = ±2 to highlight the region within 2 
standard deviations from the model's expectations. This 
analysis is similar to that of ref.[12j, but is made much 
simpler by our method which does not require to ran- 
domize the webs through a computational algorithm pre- 
serving the (reciprocal) degree sequences. The food webs 
considered here are from different ecosystems (lagoons, 
marshes, lakes, bays, estuaries, grasses), with a preva- 
lence of aquatic habitats. The presence of (intrinsically 
directed) predator-prey relationships implies that reci- 
procity is a very important quantity in food webs |12| . 
Thus the RCM should fluctuate less than the DCM. In- 
deed, this is confirmed by our analysis. The z-scores for 
the motifs to = 2, 3, 13 are significantly reduced from the 
DCM to the RCM. Also, while the motifs m = 1, 6, 10, 11 
display large values of z with opposite signs across dif- 
ferent webs under the DCM, the signs of all statistically 
surprising motifs (i.e. when \z\ > 2) become consistent 
with each other under the RCM (except for m — 13). 
As a consequence, under the RCM all networks display 
a very similar pattern, and the most striking features 
of real webs become the over-representation of motifs 
TO = 2, 10 (plus TO = 6, 11, 13 for the Little Rock Lake 
web) and the under-representation of motifs m = 5, 9, 13 
(plus m = 3,7,8 for Little Rock Lake). In particular, 
the under-representation of motif to = 9 (the 3-loop) is 
the most common pattern across all webs, and becomes 
stronger as the reciprocity of the web increases. Also note 
that in a network with no reciprocated links, the num- 
ber of motifs with at least a pair of reciprocated links 
is zero. Under the RCM, the expected number of these 
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FIG. 5. Legend: • - Chesapeake Bay, ■ - Little Rock Lake, ▲ 
- Maspalomas Lagoon, ▼ - Florida Bay, - St Marks Seagrass, 
★ - Everglades Marshes, - Grassland, ♦ - Ythan Estuary. 
Application of our method to the analysis of directed motifs 
involving three vertices in 8 real food webs. Top panel: z- 
scores obtained enforcing only the in-degree and out-degree 
sequences (directed configuration model). Bottom panel: z- 
scores obtained enforcing also the reciprocal degree sequence 
(reciprocal configuration model). 



motifs remains zero. By contrast, their expected number 
under the DCM is always positive. Thus we confirm that 
the upgrade to the RCM is necessary, as its stricter con- 
straints allow to analyze 3- vertices motifs once 2- vertices 
motifs (i.e. all possible dyadic patterns) are correctly 
accounted for. The possibility to treat the RCM analyt- 
ically using our method is therefore an important step 
forward. 



ified properties in a way that is completely analogous 
to their binary counterparts [22l [23]. In a partic- 
ular weighted network W*, the local constraints are 
the strength sequence {si(W*)} — {^jW*^} (undi- 
rected case) or the joint out-strength and in-strength 
sequence {s^^W*), sr{W*)} = (di- 
rected case). We will only consider undirected weighted 
networks. The extension to the directed case is straight- 
forward. The family of randomized weighted graphs 
with the same strength sequence as a real weighted net- 
work is sometimes denoted as the weighted configuration 
model (WCM) [15]. The available microcanonical algo- 
rithms regard each link weight as an integer multiple w 
of a fundamental unit of weight, transform each edge of 
weight w into w edges of unit weight, and rewire the 
latter as in the unweighted case, now ensuring that the 
strength (total number of incoming edges of unit weight) 
of each vertex is preserved. This means replacing a list of 
L* < N{N — l)/2 weighted links, summing up to a total 
weight W* = T.i<jKj^ with W* > N{N - l)/2 un- 
weighed links. As real networks have broadly distributed 
weights summing up to a large W* , this procedure be- 
comes very time consuming as incredibly many rewiring 
steps per randomized variant must be performed. As for 
the binary case, an alternative procedure makes use of 
a naive theoretical expectation [15l |43] for the expected 
weight of a link in the WCM, in analogy with eq.([T]): 



,(W*)g,-(W*) 
2T4^* 



(15) 



However, the above expression has been shown to have 
as many limitations as its binary counterpart, and to be 
incorrect [22] . 

By contrast, as we show in the Appendix, our method 
allows to treat the WCM analytically as in the un- 
weighted case. Note that choosing the unit of weight 
in the WCM (before performing the randomization) is 
in principle arbitrary, but the resulting ensemble will be 
different for different choices. This issue of granularity is 
an open problem that deserves future investigations. Our 
grandcanonical alternative to the WCM is not aimed at 
fixing the problem, but at providing, for a given choice 
of the weight unit in the microcanonical ensemble, the 
corresponding grandcanonical expectation. 

Given a real weighted undirected network W*, our 
method proceeds by finding the particular values {x*} 
solving the N coupled equations 



D. Weighted networks 

Remarkably, our method works equally well for 
weighted graphs (where the binary adjacency matrix 
A is replaced by a non-negative weight matrix W), 
thanks to recent analytical results that allow to charac- 
terize maximally random weighted networks with spec- 



s,(W* 



(16) 



Note the difference of sign with respect to eq. ([s]) . As in 
the binary case, the knowledge of x* allows to obtain the 
expectation value {X)* and standard deviation <t*[A"] of 
any weighted topological property X analytically across 
the ensemble of weighted graphs with, on average, the 
same strength sequence s(W*) as the real network W*. 
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Again, the time required to obtain {X)* is as short as that 
required to measure the empirical value X(W*), as (X)* 
can be obtained by replacing w*j with the expectation 
value 



(17) 



into the definition of X(W* 
naive expectation (15). 



Equation (17) corrects the 



In order to apply our method, we need to choose the 
weighted topological properties to investigate. General- 
izing binary properties to weighted graphs is arbitrary, 
as no unique choice exist [T51 [i5W5] . To better highlight 
the generality of our approach, here we follow ref.[44] 
since it introduces a way to always systematically define 
a weighted counterpart X for every binary property X. 
The idea is to define X as an average of X over the en- 
semble of binary graphs generated by a convenient con- 
nection probability Pij = f(wij) € [0, 1] which is a func- 
tion of the observed weights {wij}. The functional form 
of Pij can in principle be chosen depending on the em- 
pirical properties one wants to detect; however, our pur- 
pose here is using our method to compare the empirical 
properties with the expected ones, rather than comparing 
alternative definitions of the empirical properties them- 
selves. Therefore we make the simplest choice and, given 
a real weighted network W*, we set Pij = w*j/W* where 
= E^<, w*j = s,(W*)/2 is the total weight. This 
choice yields the following definition for the weighted de- 
gree [44] : 



E 



w* 



g.(W*) 

w* 



(18) 



which is simply proportional to the strength. Similarly, 
the weighted ANND and clustering are defined as the 
counterparts of eqs.([3]) and (|4| [ii] : 



A:r'(W* 



ki 



(19) 



(20) 
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FIG. 6. Application of our method to weighted undirected 
networks. Red points are the empirical data and the blue 
dashed curves are the exact expectations obtained using our 
method (± one standard deviation). Green dashed curves are 
the flat expectations under the weighted random graph model, 
WRG [23] . The panels report fcf " versus Si (left) and Ci versus 
Si (right) for: a) and b) the Florida Bay food web (N=128) 
[38) . c) and d) the Italian interbank network (N=215) [H3], e) 
and f) the C. elegans neural network (N=265) [32], g) and h) 
a snapshot of the US airport network (N=332) |31) . 



In analogy with the binary case. A;"" and Ci can be plotted 
against ki (or equivalently Si) in order to investigate the 
correlation structure of the weighted network. 

In fig[6^we analyze the weighted and undirected (sym- 
metrized) versions of four networks we have already con- 
sidered in the previous binary study: the the Florida 
Bay food web, the Italian interbank network, the C. El- 
egans neural network and the US airport network. We 
compare the empirical results with the expected trends 
(± one standard deviation) under the WCM obtained 
using our method. For simplicity, we only show the 
results obtained using our method, and omit the time- 
consuming microcanonical comparison. Note that, since 



the strengths are preserved in the WCM, i.e. (s^)* = 
Si(W*) Vi, the total weight is preserved as well: {W)* — 
W* . We find that the empirical trends are quite scat- 
tered and variable: some are weakly increasing (Florida 
Bay), some are approximately constant (interbank web), 
others first increase and then decrease (airport network). 
These diverse trends must be compared with a null model 
which, unlike naively expected from eq.(15), is not flat 



and displays a not easily characterizable increasing be- 
havior. A common feature is that, with respect to the 
null behavior, real weighted networks are more assorta- 
tive and clustered for low values of the strength, while 
they are less assortative and clustered for high values for 
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the strength. These considerations confirm that, even in 
the weighted case, the empirical trends are uninformative 
by themselves, and always require a comparison with a 
null model. Our method allows to treat the otherwise 
problematic WCM in a simple way, in straightforward 
analogy with the binary case. 

Although we do not consider this possibility here ex- 
plicitly, for weighted networks one could also enforce 
additional constraints on the degree sequence. This 
amounts to specifying not only the strength of each ver- 
tex, but also its purely topological degree [121 [HI HO] ■ In 
this case, sampling the randomized ensemble by means 
of computational algorithms becomes even more difficult. 
By contrast, our method can still be used efficiently, as 
the analytical expressions characterizing the correspond- 
ing maximum-entropy ensemble have been derived re- 
cently [22] . Those results easily allow to obtain the equa- 
tions implied by the ML principle, as well as the expec- 
tation values of network properties over the ensemble, in 
a straightforward fashion. 

For completeness, in figjT] we show the ratios of the 
contraints standard deviations, a^, to the constraints ex- 
pected values, (a quantity known in statistics as coef- 
ficient of variation) , plotted versus the expected values. 
For small values of the constraints oqIhq ~ {[Iq)^^^'^ 
(an approximation valid both for binary and weighted 
networks); the higher the constraints expected values, the 
more important becomes a correction factor whose entity 
(and sign) depends on the particular type of network con- 
sidered (see Appendix for the details of the calculations): 
in the food webs (panel b) the presence of in-degree hubs 
implies the correction to be important even for small out- 
degree vertices. 



V. DISCUSSION 

Our method make use of the correct expressions ([6]) 
and ( |17[ ) for the connection probability and expected 
weight respectively, in place of the incorrect naive ex- 
pressions ([T]) and (15). While the latter depend only on 
the properties (fc^ or Si) of the end-point vertices i and 
J, the former depend on the entire degree or strength se- 
quence through eqs.([5]) and (16). We have shown that 
this has a dramatic effect on the properties of the ran- 
domized ensemble. In particular, we have found that en- 
forcing the same set of constraints in different networks 
can yield very different trends for the randomized prop- 
erties, whose behavior is therefore highly unpredictable a 
priori. The general expectation that randomized higher- 
order properties (such as (fcf") and (c^) in unweighted 
networks or (fc;) and (ci) in weighted networks) are inde- 
pendent of the local ones (ki or Si) turns out to be only 
a very infrequent possibility among the possible scenar- 
ios. Indeed, we have also found increasing and decreasing 
trends for the randomized quantities, and shown that the 
particular behavior displayed by the null model highly 
depends on the particular values of the constraints in the 
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FIG. 7. The panels report a) the ratios a*[ki]/ki plotted ver- 
sus the degrees ki for the binary undirected networks of fig[2] 
b) the ratios o-*[fcr*]/fc,°"* and a*[kl"]/kr plotted versus the 
degrees fcf"' and fe™, respectively, for the binary directed net- 
works of fig|3]and fig|4]and c) the ratios a*[si]/si plotted ver- 
sus the strengths Si for the weighted undirected networks of 
fig[6] The food webs are indicated by means of symbols. The 
black dashed line is the function f{x) = x~^^^ which is ex- 
pected to well reproduce the coefficients of variation for small 
values of the constraints. 



original real-world network. This makes the comparison 
with the particular null model even more important than 
previously expected, and underlines the importance of a 
tractable description enabled by our analytical method. 

The incorrectness of eqs.([T]) and ( 15 ), as well as of their 
directed counterparts, has another series of undesired ef- 
fects, as those expressions have been explicitly used to 
define important structural quantities involved in net- 
work analysis. Indeed, even when not explicitly used 
to randomize a network, null models unavoidably enter 
into the analytical expressions defining many properties 
of interest. For instance, many popular community de- 
tection algorithms make use of the concept of modular- 
ity to evaluate the quality of a partition of the network 
against a null case |46j . A partition into communities 



can be represented by the matrix {(5^}, where Sij = 1 
if vertices i and j are assigned the same community and 
dij = otherwise. For a binary undirected network A*, 
the modularity Q of the partition {dij} has been defined 
as 



Q 



1 

2L* 



Pij) 



(21) 



where pij is the probability that i and j are connected 
in a suitable null model, and the most frequent choice 
is the CM. Similarly, for a weighted undirected network 
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W* the modularity of the partition {%} is [33] 



where 



is the expected weight of the hnk joining i 



and j in the WCM. Unfortunately, the expressions for 



Pij and (wij) are always taken to be eqs.(|T|) and (15) 



respectively. To the best of our knowledge, no rigorous 
assessment of the consequences of using these approxi- 
mations has been provided. Therefore the problems de- 
scribed in the present paper affect any modularity-based 
community detection problem in an uncontrolled way. 
Our methods provides the previously unavailable exact 
expressions ^ and (17), whose values can be inserted 
into eqs.(21) and (22) to have the correct modularity. 



A straightforward analysis of how the correct expres- 
sions change the detected community structure of real 
networks is an important open point to address in the 
future. 



VI. CONCLUSIONS 

We have presented a fast and exact method to obtain 
analytical results about the grandcanonical ensemble of 
randomized variants of a particular real-world network 
that preserve its average local properties. The method 
works for both weighted and unweighted networks, and 
for both directed and undirected graphs. In any case, 
it requires as the input only the strength or degree se- 
quence(s), which represent the sufficient statistics of the 
problem. Our approach can be extended to enforce dif- 
ferent or additional constraints, such as the reciprocity 
structure in directed networks or the simultaneous spec- 
ification of strengths and degrees in weighted networks. 
Notably, our results show that maximally random net- 
works exhibit a diverse range of behaviors which is sensi- 
tive to the particular values of the constraints displayed 
by the real network, making a case-by-case comparison 
of the observed properties with the randomized ones nec- 
essary. This diversity of outcomes is in any case not cap- 
tured by widely used but incorrect expressions for the 
expected properties. Unfortunately, important network 
properties such as the modularity completely rely on such 
expressions, a problem that may have therefore biased 
previous analyses of community structure in networks. 
We believe that our contribution represents a promising 
step towards the identification of relevant information in 
real networks. 



Appendix A: GENERAL 
MAXIMUM-LIKELIHOOD METHOD 



Here we describe our maximum-likelihood method in 
its general formulation. Our approach combines previ- 
ous analytical results (obtained by one of us [11] [53] 



and other authors [HI HH [21]) about the properties of 
maximum-entropy graph ensembles with previous results 
(by one of us [3D]) about the maximum-likelihood esti- 
mation of free parameters in such ensembles, and adds 
to them a new technique to obtain analytical expressions 
for the expectation value and standard deviation of any 
topological property of interest across the ensemble. Af- 
ter describing the method in general terms, we derive the 
explicit expressions that apply in the particular cases of 
local constraints (for undirected, directed and weighted 
networks). We then consider an extension to nonlocal 
constraints, and finally compare our analytical method 
with alternative computational techniques. 



1. Maximum-entropy probability distribution 

Our method aims at characterizing analytically the 
properties of families of randomized variants of a par- 
ticular real network. In more rigorous terms, a fam- 
ily of randomized network variants is a statistical en- 
semble of graphs where a set of structural constraints 
has been specified, and the rest of the topology is com- 
pletely random. Let us denote by G a generic network in 
the ensemble, and by G* the particular real-world net- 
work that we need to randomize. The ensemble will 
consist of all possible networks {G} of the same type 
of G* (binary/weighted, directed/undirected), and will 
include G* itself. For binary (either directed or undi- 
rected) networks, each graph G is completely specified 
by its adjacency matrix A, i.e. G = A. Similarly, for 
weighed (either directed or undirected) networks, each 
graph G is completely specified by its weight matrix W, 
i.e. G = W. We will keep our discussion completely 
general and use G to indicate a graph of any type (di- 
rected/undirected, binary/weighted). Thus G can al- 
ways be thought of as a matrix with entries {gij}, where 
gij represents the (either binary or non-negative) weight 
of the edge (i,j). Any topological property X evaluates 
to X{G) when measured on the particular network G, 
i.e. it is an (arbitrarily complicated) function of the en- 
tries {gij}. 

Each graph G in the ensemble has an occurrence prob- 
ability ^'(G) whose form is determined by the particu- 
lar constrains enforced. This probability must always be 
such that 



1 



(Al) 



where the sum runs over all graphs in the ensemble. The 
expectation value of a topological property X across the 
ensemble is the mean value (ensemble average) 



(X)^^X(G)P(G) 



(A2) 



G 



Let us denote the set of constraints {Ca} by the vector C, 
where each Ca is a topological property that, unlike any 
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other generic property X, we need to tune to the particu- 
lar value displayed by the real network G* . Enforcing the 
constraints exactly, i.e. allowing only the graphs G such 
that C(G) — C(G*), results in a so-called microcanoni- 
cal ensemble characterized by the uniform probability 



P(G) = I l/^[CiG*)] if (7(G) = (7(G 







otherwise 



(A3) 



where J\f[C{G*)] denotes the number of graphs in the en- 
semble for which the value of each constraint Cq equals 
the value Cq(G*). Microcanonical graph ensembles are 
hard to deal with analytically, and they are most often 
sampled computationally by generating many random- 
ized networks explicitly, using probabilistic rules that 
ensure that the constraints are matched exactly. Cur- 
rently, such computational techniques are the only avail- 
able method to randomize a real network. Unfortunately, 
the need to sample the ensemble explicitly and generat- 
ing a large number of randomized graphs makes this ap- 
proach computationally demanding, time consuming and 
beyond analytic control. 

In order to develop a randomization method which 
is fast and analytically tractable, we exploit the results 
in ref.^14^ and consider the alternative possibility to en- 
force the constraints on average, i.e. by only specifying 
their expectation values (C). The resulting ensemble is 
a (grand) canonical one where each graph G is assigned 
a probability P{G) that maximizes the Shannon-Gibbs 
entropy 



5^-^P(G)lnP(G) 



(A4) 



G 



subject to the constraints J2g ^(G) = 1 and (C) = C. 
Maximizing the entropy subject to constraints is widely 
used in statistical mechanics, and in general for prob- 
lems with incomplete information [37J 148) . The desired 
maximum-entropy graph probability can be found by in- 
troducing a set of Lagrange multipliers 6 — {9a} enforc- 
ing the constraints C — {Cq}. The resulting conditional 
(on the value of 6) probability reads [14] 



PiG\e) 



-H{G,e) 
Z0) 



(A5) 



where H{G, 6) is the graph Hamiltonian defined as the 
linear combination 



H{G, 9) = J2 OaCaiG) = 9- (7(G) 



(A6) 



and the normalizing quantity Z(9) is the partition func- 
tion, defined as 



Z(0") = ^e-^(°'' 



(A7) 



The above results show that the graph probability 
P(G|6') always depends on the value 9, which in turn de- 
pends on the cons train ts considered. As a consequence, 
we can rewrite eq.(^A2| more explicitly as a function of 9: 



(AS) 



(X),-^^X(G)P(G| 



G 



where denotes that the ensemble average is evaluated 
at the particular parameter choice 9. The above expres- 
sion clarifies that the expectation value of any topological 
property X depends on the specific enforced constraints 
through 9. Different choices of the constraints imply dif- 
ferent values of 9, P(G|^), and {X)g. 

2. Maximum-likelihood parameter estimation 

As we mentioned, maximum-entropy graph ensembles 



generated by eq. ( A5 ) have been used extensively to char- 



G 



acterize mathematically networks with specified proper- 
ties P m m mi Hg. However, previous studies did 
not focus on the randomization of a particular real net- 
work (which is our main interest here) , but rather on the 
effects that the specification of certain structural prop- 
erties has on other aspects of network topology. As a 
consequence, the Lagrange multipliers {9a} have been 
considered as free parameters, generally drawn from care- 
fully chosen probability densities [m [211 122] that allow 
analytical results, in terms of which the properties of the 
network model have been investigated. In most cases, 
the aim has been to explore the topological properties 
in the thermodynamic limit N ^ oo, where N is the 
number of vertices of the network. This means that only 
generic statistical properties of real networks, such as a 
power-law degree distribution, were used to generate the 
ensemble. However, this implies that the specific prop- 
erties of a particular real network (such as deviations 
of individual vertices from the fitted degree distribution, 
the intrinsic finitcness of the system, etc.) are ignored 
and, more importantly, that there is no correspondence 
between the vertices of the real network and those of the 
model. Thus this approach allows to inspect the prop- 
erties of maximum-entropy graph ensembles, but does 
not allow the latter to be considered as null models of a 
particular real network. As a consequence, it cannot be 
used to detect empirical topological patterns consisting 
of statistically significant deviations from a null network 
model. 

Here we make one step forward and construct, for a 
given choice of the constraints, the particular maximum- 
entropy graph ensemble representing the family of cor- 
rectly randomized counterparts of a given real network 
G*. Explicitly, we consider a grandcanonical ensemble 
of graphs with the same number TV of vertices as the 
real network, and for a given choice of the constraints 
we fit the model defined by eq. ( A5 ) to the empirical net- 
work G*. To this end, we exploit previous results 130] 
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showing that maximum-entropy graph ensembles defined 
by eq. ( A5 ) are a particular class of models for which 



the maximum-likelihood principle provides an excellent 
method of parameter estimation, since they are free from 
problems of bias afflicting other network models. In par- 
ticular, it can be easily shown [3^ that the log-likelihood 



C{e) = lnP(G* 



-H{G*,d) - liiZ{9) (A9) 



to obtain the real network G* is maximized by the partic- 
ular parameter choice 0* such that the ensemble average 
{Ca)g, of each constraint Ca equals the empirical value 
Ca{G*) measured on the real network: 

{Cy = {C)g, = J2 C(G)F(G|r ) = (7(G*) (AlO) 

G 

where we have used (•)* as a shorthand notation to in- 
dicate the ensemble average evaluated at the par- 
ticular value 9*. The above results means that the 
maximum likelihood principle indicates, for maximum- 
entropy graph ensembles, precisely the parameter choice 
that ensures that the desired constraints are met. This is 
not true in general: in other network models, tuning the 
average values of the topological properties of interest to 
their empirical values requires a parameter choice which 
in general does not maximize the likelihood to obtain the 
real network [30] , thus introducing a bias in the analysis. 

The idea to take the observed constraints C{A*) as 
the input and find the 'hidden' values 9* that gener- 
ate those constraints as the most probable ones was al- 
ready proposed in ref.|30] with the purpose of check- 
ing whether 9* correlates with some external set of em- 
pirical non-topological quantities, thus unveiling possi- 
ble mechanisms shaping the network topology. Here we 
make progress, noting that finding the values 9* repre- 
sents a preliminary step in order to generate the ran- 
domized ensemble we are looking for, and have complete 
analytic control over it. This is completely independent 
of whether there are external empirical quantities corre- 
lating with 9* . 



Note that in eqs.(A8) and (AlO) the expectation val 



ues and the model parameters play inverted roles: while 
in eq. ( A8 ) the expectation values are obtained as a func- 
tion of the parameters 9 which can be varied arbitrar- 



ily, in eq.(A10l the observed constraints, which are mea- 



sured on the particular real network and are therefore 
given as an input, are used to fix the model parame- 
ters to the values 9*. Interestingly, this opposite line 
of research has been used quite extensively in traditional 
social network analysis (where maximum-entropy ensem- 
bles of networks are widely used under the names of p* , 
logit or exponential random graph models [49-5 Ij ) but 
has not yet been transferred to the randomization prob- 
lem frequently occurring in complex networks theory. 
As we show below, the maximum-likelihood parameter 
choice is exactly what we need in order to obtain statis- 
tically correct expectations over ensembles of randomized 



variants of any particular real- world network. This allows 
to understand which properties of a real-world network 
can be simply traced back to the enforced constraints, 
and which require more complicated explanations. An- 
other important difference with respect to the main ap- 
proach followed in social network analysis is that our 
method allows to analyze weighted networks in exactly 
the same way as binary graphs, which are instead usually 
not studied within the p* framework. As a consequence, 
some of the analytical results we derive and use represent 
previously unavailable tools to study weighted networks 
(and maximum-entropy ensembles of them) through a 
straightforward analogy with binary networks. Finally, 
in all the applications we consider it is always possible 
to find the maximum-likelihood parameter values 9* ex- 
actly even for large networks, without resorting to the ap- 
proximate techniques traditionally used in social network 
analysis [51'. Therefore our approach extends in many 
directions the connection between exponential random 
graphs and maximum-entropy network ensembles, and 
strengthens considerably the existing relation between 
social science and network theory. 



3. Expectation values of topological properties 



Equation (AlO) provides an implicit expression for 
the value 9* , and solving it is equivalent to maximizing 



eq.( A9 ). The numerical value of the solution 9* is the key 



ingredient we are looking for in order to detect topologi- 
cal patterns in the real network G* analytically, without 
performing any time-consuming computational random- 
ization. Indeed, if we insert the value 9* into eq.(A8l we 
obtain 



^^X(G)P(G| 



(All) 



G 



which provides the exact expected value of any topolog- 
ical property X across the maximum-entropy graph en- 
semble where the expected values (C) of the topological 
properties C chosen as constraints are set equal to the 
empirical values C(G *) m easured on the real network 
G*, as ensured by eq.(AlO). For simplicity, given a real 
network G* and a set of constraints C, we shall some- 
times call {X)* the randomized value of the topological 
property X. The comparison of {X)* with the empirical 
value X{G*) allows to assess whether, in the real net- 
work G*, the topological property X requires additional 
information besides that provided by the properties C. 
If X{G*) is sufficiently close to (X) * (within a statis- 
tical error that we determine in A 4), one can conclude 



that the enforced constraints C fully explain the property 
X. By contrast, if X{G*) is significantly different from 
{X) * , then the properties C do not explain the property 
X, which means that the structure of G* is determined 
by other factors besides those determining C. This allows 
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to assess which topological properties can be traced back 
to (i.e. explained by) the chosen constraints in any real 
network, and which can not. Trivially, if X is one of the 
properties among the enforced constraints (i.e. if X = Cq 
for some a), then eq.(AlO) implies X{G*) = (X)* by 
construction. 

Note that any other parameter choice 0^9* would 
not enforce the chosen constraints and would yield an 
expectation value {X) g different from the desired one, 
i.e. not corresponding to the correct randomized value 
{X)* for that particular network and for that particu- 
lar choice of the constraints. This clarifies why previ- 
ous results US HI HD [H] about the properties of 
maximum-entropy ensembles, that were obtained using 
as a free parameter unrelated to the empirical val- 
ues C{G*) and to the real network G* itself, cannot 
be used in order to solve the pattern detection problem 
considered here. Also note that C{G*) is the sufficient 
statistics of our problem, which completely determines 



of gij reads 



9* through eq.(AlO) and consequently any randomized 
property {X)* . The knowledge of the other topological 
properties of the real network G* is useless. This means 
that two real networks GJ and G2 with exactly the same 
values C{G*i) = C(G2) of the constraints generate the 
same maximum-entropy ensemble, and give rise to the 
same value of 9* and (A')*, as should be. 



Clearly, the possibility to solve eq.(A10l and to ob- 



tain the randomized properties through eq.(All| both 



depend on whether one manages to rewrite the formal 
expression for {X)g in eq.(A8) in a simplified form that 
avoids the unfeasible actual enumeration of all graphs 
{G} in the ensemble. In practical terms, this means that 
not all s pecifications of the constraints C allow to solve 
eq.(A10l and obtain 9* , and not all topological proper- 



ties X allow to be averaged exactly through eq.(All| 



However, as we describe in [B] the first step can always 
be carried out successfully whenever one considers local 
constraints as the ones of interest for us. Similarly, as we 
now show in general and then restate more explicitly in 
each particular case, the expectation value {X)* of any 
higher-order topological property X of interest can be 
rewritten, either exactly or approximately, in a way that 
is only as complicated as measuring X{G*) on a single 
network, rather than on all graphs {G} in the ensem- 
ble. This represents a major advantage of our method: 
the computation of an expectation value across the en- 
tire ensemble of graphs is only as time-consuming as the 
computation of the corresponding observed value on the 
empirical network G*. Thus, if the observed value can 
be computed in reasonable time, the same is true for the 
expectation value. To see this, we write down an approx- 
imated expression for {X)* as a Taylor expansion. Note 
that any property X{G) depends in general on all the 
entries {gij} of the matrix G, which are the fundamental 
degrees of freedom of the problem. The ensemble average 



(A12) 



G 



If we define the gradient matrix of any topological prop- 
erty X{G) as 



va:(g) 



/ 9X(G) 
dgii 



ax{G) 

\ dgNi 



dX(G) 

dgiN 



dX(G) 

dgNN 



(A13) 



and if we denote by (G) the matrix whose entries {G)ij 
are the expectation values {gij), it is possible to expand 
(AT) around (G) and write the multidimensional first- 
order Taylor expansion 

X{G) = X{{G)) + Y,{g.,, - {g,.j)) f |^ 

= X{{G)) + (G - (G)) * VX{{G)) + ... (A14) 

In the above expression, (•)g=(g) means that we are eval- 
uating the quantity in brackets by replacing each gij with 
{gij), and 



A * B EE ^ a^-jhij 

i,3 



(A15) 



denotes the scalar product of two matrices A and B, 
and the double sum runs over all N(N — 1) ordered pairs 
of vertices (with i ^ j). Note that for an undirected 
network, where gij — gji b y construction, half of the 



terms in the sum in eq.( |A14 ) will be equal to zero, since 



one has either dX/dgji = or dX/dgij = 0, depending 
on whether gij or gji appears in the formal definition of 
X. With the above approximation, the expectation value 
of X reads 



{X)=X{{G)) + . 



(A16) 



since the first-order terms vanish. The above formula 
shows that, if one evaluates {X) by simply replacing G 
with (G) into A'(G) {linear approximation), the differ- 
ence with respect to the exact expectation value is only 
in the second- and higher-order terms. This is true for 
any value of 9, on which all expectation values depend. 
As already explained, our metho d con sists in choosing 
the particular value 9* solving eq.(A10l, which yields an 
expectation value 



{X)*=X{{G) 



(A17) 



Among all possible parameter values 9, the choice of 
9* ensures that the deviation of the approximate value 



^((G)) from the exact one ^(G) in eq.(A14) is mini- 
mal, since (G)* is as close as possible to G, if the chosen 
constraints C are chosen as a reference to measure the 
difference between (G) and G. In particular, when X co- 



incides with one of the enforced constraints Ca, eq.( A17) 
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becomes an exact expression, as we mentioned. More- 
over, as we show later in[Bj most topological properties of 
interest in our analysis are either multilinear functions of 
statistically independent matrix elements {gij} or ratios 
of multilinear functions. In the former case, the expec- 
tation value {X)* is exactly X{{G)*). In the latter case, 
the numerator and denominator will be separately eval- 
uated exactly, and the approximation will only affect the 
ratio. In general, ratios of averages can be very different 
from averages of ratios. However we confirmed (see figsj2] 
and [3]) that our estimates for the ratios are in very good 
accordance with what is obtained in the microcanonical 
case using the LRA, where averages of ratios are eval- 
uated exactly. Moreover, recall that we are interested 
in determining an interval of statistically significant val- 
ues around {X)* , rather than {X)* alone. Our results 
(figsj2] and |3]) also show that the difference between the 
microcanonical and (approximate) grandcanonical value 
of {X)* is typically much smaller than the standard de- 
viation of X (that we obtain below), so using eq.(Al7) is 
in any case a very good way to proceed. 

The above discussion clarifies that a good approxima- 
tion to the randomized value {X)* of any topological 
property of interest is given by simply replacing each gij 
with (gij)* in the definition of the property X(G), in the 
same way as the empirical value X(G*) is obtained by re- 
placing each gij with the observed entry g*j of G* in the 
definition of X(G). This means that the empirical value 
X(G*) (if the full adjacency matrix is used, see main 
text) and the approximate randomized value X{{G)*) re- 
quire exactly the same computational time, which makes 
our method faster than any other available alternative 
approach (and in general as fast as possible). Clearly, in 
order to evaluate eq.( A17) the complete knowledge of the 
values 



= ^ftjP(G|r) 



(AI8) 



G 



is required. While for generic choices of C it may be im- 
possible to obtain the formal expression for {gij) g and/or 

the particular parameter value 9* , in [b] we show that lo- 
cal constraints always allow to obtain {gij)* exactly. This 
makes the problem analytically solvable, and implies that 
our method becomes very simple in all the applications 
of interest. 



4. Variances of topological properties 

As we mentioned, another important advantage of our 
method is the possibility to obtain, besides the expec- 
tation value, the analytical expression for the standard 
deviation of any topological property of interest. This 
provides a statistical error allowing to detect significant 
deviations of any empirically observed topological quan- 
tity X{G*) from its randomized value {X)* . To this 
end, we employ the fundamental expression relating the 



variance of a function of many random variables to the 
variances of the latter, whose most popular consequence 
is the general formula for the propagation of errors in 
experimental measurements. In our notation, the vari- 
ance of a topological property X across the ensemble is 
defined as 



a^[X]^{X^)-{X)^ = {{X-{X))^) 



(A19) 



(which depends on 9). Using the linear approximation in 
eq.(A14| we can write 



a^[X] = {[X{G)-X{{G))f 

i,3 *.s 



(A20) 



dX dX 



,dgij dgts / g=(g; 
where 

cr[gtj,gts] = {{gij - {gij)){gts - {gts))) 
= {9ij9ts) - {gij) {gts) 

is the covariance of gij and gts, and 

{9ij9ts) = ^gtjgtsP{G\9) 

G 

For the 'diagonal' terms given by i = t and j 
covariance <T[gij,gts] equals the variance 



+ ■■ 



(^^[9i. 



{9%) - = <^[9iji9t3\ 



(A21) 

(A22) 

s, the 

(A23) 



(again, both a[gij^gts\ and a'^lgij] depend on 9). In a 
different context where X is a function of many exper- 



imental quantities {gij}, eq.(A21) provides the general 
formula for the propagation of errors (from {gij} to X), 
if the measured value of gij is used as the best estimate 
for {gij), and if its experimental error is used in place of 
CT[gij]. Here, we do not need approximate estimates for 
{gij) and cr[gij], since both quantities can be completely 
specified: even if there is always a single observation, i.e. 
the real network G*, the latter generates the entire en- 
semble of graphs which is described by t he probability 



A2 



P{G\9*), as we discussed in detail in 

As for the expectation value {X)* , our approach pro- 
ceeds by evaluating the standard deviati on (t\X ]* at the 
particular parameter value 9* solving eq.(A10|: 



a*[X] 



,gts 



dX dX 
dgij dgts 



G=(G)* 



where 



<^*[9i3,9ts] = {gzjgts)* - {gij)* {gts)" 



(A24) 



(A25) 



Note that, as for the expected values, the above standard 
deviation makes use of the linear approximation and is 
therefore not exact. However, when we measured also 
the microcanonical standard deviations, we found an ex- 
cellent agreement with our grandcanonical ones (see fig. 
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|3^-b), showing that the errors on the estimates of our 
standard deviations are smalL 

Equations (A24) and ( |A25[ ) show that the values 



{9i]9ts) 



Y,9^J9tsP{G\e*) (A26) 

G 



are the fundamental quantities, besides the averages 
{gij)* given by eq.(A18l, required in order to obtain the 
standard deviation a* [X] of any topological property X. 
For generic choices of the constraints (7, obtaining the 
value of {9ij9ts)* can be very complicated or even im- 
possible, as we already discussed for {gij)* ■ However, as 
we will show, local constraints always allow to evaluate 
analytically all the covariances, and hence the standard 
deviation a* [X] of any property X. 

Equation ( |A24 ) is the key expression providing the sta- 
tistical error associated with {X)* . For any topological 
quantity X , our method allows to assess whether the em- 
pirical value X{G*) is consistent with the randomized 
value {X)* within z standard deviations (where z is a 
conveniently chosen threshold value), i.e. whether 



\X{G*)-{XY\<za*[X] 



(A27) 



As long as the above inequality holds, it is legitimate to 
say that the particular property X evidences no signifi- 
cant deviation of the real network G* from a null model 
where the constraints C are specified. This means that 
the observed value X{G*) requires no explanation be- 
sides those accounting for the observed values C{G*) of 
the constraints. By contrast, if the above inequality is 
violated, then one has a signature that the observed net- 
work G* is not completely a result of the specification 
of the constraints C. Additional mechanisms, besides 
those determining the values of the constraints, are at 
work. In other words, higher-order patterns which can- 
not be traced to low-level constraints are present, and 
our method is able to detect them. In practice, in order 
to discriminate between the two possibilities, it is useful 
to compute the two values 



{Xy ±za*[X] 



(A28) 



which delimit the region within which an observed value 
X{G*) would imply the acceptance of the null model 
from the one where an observed value X{G*) would im- 
ply the rejection of the null model. As an alternative, 
rather than fixing a threshold value for z, one can directly 
compute the number of standard deviations by which the 
expected and the empirical value of X differ, i.e. the z- 



X{G*)-{XY 



(A29) 



Large positive (negative) values of z[X] indicate that 
X{G*) is substantially larger (smaller) than expected, 
while small values signal no significant deviation from 
the null model (note however, as mentioned in the main 



text, that z-scores are easily interpretable only for nor- 
mally distributed properties). 

This concludes the description of our method in its 
general form. In what follows, we consider the particular 
case of interest for the present analysis, i.e. when the 
constraints C are (either binary or weighted) local topo- 
logical properties, or when they are nonlocal but sim- 
ple enough to preserve the analytical character of the 
method. 



Appendix B: LOCAL CONSTRAINTS 

The most important case is when the constraints C 
are local (or first-order) topological properties, i.e. prop- 
erties determined by moving only one step away from 
a vertex, thus reaching only its first neighbours. In bi- 
nary undirected networks the fundamental local prop- 
erty is the degree ki = '^j^^ciij, while in weighted 
undirected networks the corresponding quantity is the 



strength = ^ 



In directed networks, a pair 



of inward and outward variants of the same quantities 
(i.e. the in-degree k™- and out-degree or the in- 

strength s" and out-strength s°"*) characterizes the local 
properties of each vertex. Choosing local constraints is 
the natural option when one is interested in understand- 
ing the effects that the specification of low-order prop- 
erties, involving only direct interactions, has on higher- 
order properties involving longer chains of interactions. 
In what follows, we therefore discuss our method in de- 
tail in the particular case of local constraints. We will 
consider both binary and weighted networks, and both 
undirected and directed links. Importantly, we will show 
that in all these cases the graph probability P{G\6) fac- 
torizes as 



P{G\9) = l[D.,,ig,„g,0) 

i<3 



(Bl) 



where the product runs over all unordered pairs of ver- 
tices (i,j) (with i < j) and Dij{g,g'\6) is the dyadic 
probability that the pair {gij,gji) takes the particular 
value [9,9'), i.e. the joint probability that gij = g and 
simultaneously gji = g' . Clearly, 

D,j{g,g'\e) = D,,{g',g\e) (B2) 

Note that Dij{g, g'\6) is normalized such that 

^A,(5,ff'|^l = l (B3) 



where g and g' run over all the allowed values for gij and 
gji ((7 = 0,1 for binary networks, while 5 = 0, 1 • • • -I- 00 
for weighted networks; the same for g'). The marginal 
probability that gij takes the particular value g, indepen- 
dently of the value of gji , is 



P^J{g\0)=J2D^A9,9'W) 



(B4) 



9' 
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and, consistently with eq.([B3|, is normalized such that 
^P.,(5|0l = l. (B5) 



Note that for undirected networks, where gij — gji by 
construction, we have 



D,,{g,g'\6) = 5,,,yP.M^) (B6) 

where 5g^gi = 1 if g = g' and 5g^g' — ii g ^ g' . 

The factorization of P{G\9) according to eq.(Bl| im- 
plies that eq.(A12| can be rewritten as 



(B7) 



By contrast, ii i = t and j = s then 



{g^m,) = {9l) = Y.,9^P^A9\o) 



^[gi3^9r]] = {91 j) - {9r]) 

Finally, if i = s and j — t we have 



{9^39J^) = llg.,g' 99' Dij{9 ^ 9' \^) 

^[gij^g^t] = {gijgji) - {9i3){9jt) 



(B12) 
(B13) 

(B14) 
(B15) 



Again, all the above quantities can be obtained analyt- 
ically and evaluated exactly at the particular value 0* 
solving eq.(B8). As a consequence, if eqs.(Bll|, (B13) 



and (B15) are inserted into eq.(A21), we find that the 
expression for the variance [X] of any topological prop- 



erty X reduces from eq. ( A24 ) to the simpler formula 



which can alwa ys be obtained analytically. Using the 
latter, eq.(A10l can be simply rewritten exactly as 



C((G)*) = C(G*) 



(B8) 



which allows the maximum-likelihood parameter values 
9* appearing in (G)* to be easily calculated numerically. 
Alternatively (e.g. depending on the software used) one 
can calculate 6** by directly maximizing the log- likelihood 
defined in eq.(A9), which in this case takes the simpler 
form 



Cie)^lnP{G*\0)^J2^nD,,igl 



(B9) 



(we always adopted the maximization of the log- 
likelihood). In both cases, even for very large networks 
this preliminary parameter estimation takes a negligible 
time with respect to the calculation of any nontrivial 
topological property. This implies that eq.(B7) can al- 



ways be evaluated exactly at the particular parameter 
choice 0* , providing the correct value (gij)* in terms of 
which the ensemble average {X) * of any topological prop- 
erty X can be obtained analytically through eq.(A17|. 



Thus, as we discussed, the time required to obtain {X}* 
(which formally is an average over all possible graphs in 
the ensemble) is just the same as that required in order to 
measure X{G*) on the real network G*. This makes our 
method incredibly faster than other randomization pro- 
cedures that require the actual computational generation 
of many randomized variants (necessarily sampling only 
a part of the ensemble) of the real network, on each of 
which X must be computed explicitly before performing 
a final average approximating {X)* . 

The standard deviation a* [X] of any property X can 
be evaluated very easily as well. Equation eq. ( Bl I implies 
that if and {t,s) are two distinct pairs of vertices 
then 



(gtjgts) = {gi]){gts) 
(^[gi3,gts\ = 



(BIO) 
(Bll) 



+ <^*[gtj,g3t\ 



J dx 

^9ij / G=(G)* 

I dX dX \ 
\dg^j dgjij^^^^^yj 



(B16) 



involving only a single sum over pairs of vertices. In 
the above expression, we have kept our convention to let 
the sum run always over all possible ordered pairs of ver- 
tices, thus considering the pairs and {j, i) as distinct 
terms in the summation. For ensembles of directed net- 
works, gij and gji are different random variables which 
may or may not be dependent on each other (depend- 
ing on the enforced constraints, as we show in detail be- 
low). Equation (B16) takes care of both possibilities by 
including the covariance a* [gij , gji] . For ensembles of 
undirected networks, gij and gji are actually the same 
random variable and are thus perfectly correlated, which 
= \J<^*\9i3-,9t3\ = Again, eq. 



(B16) takes care of this by compensating the summation 



over a doubled number of terms with the presence of the 
covariances which exactly restore the correct expression. 
In such a way, one does not have to care whether the 
network is undirected when using eq.(B16l, which there- 



fore applies without modifications to all the cases we will 
consider below. Different cases only differ by the specific 
expression of G*\gij^gji\. This is very convenient when 
implementing the formula computationally. Another de- 
sirable consequence of formally treating gij and gji as 
different variables even in undirected networks is that in 
eq.(B16) the derivative dX/dgts of any function X{G) of 



(a subset of) the entries {gij} can always be computed 
by repeatedly applying the elementary rule 



dgts 



SitS 



it^js 



(BIT) 



(where dij ~ I if i ^ j and Sij — ii i j) for both 
directed and undirected graphs. 
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Summarizing the results discussed so far, we showed 
that for local constraints our method allows (gij)* , {gfj)* 
and {gijgji}* to be computed exactly, and to use them 
in order to obtain the expected randomized value {X)* 
and standard deviation a* [X] of any topological prop- 
erty X through eqs.(Al7) and (BI6I respectively. Un- 



like alternative computational methods, our approach is 
completely analytical and allows to evaluate the random- 
ized value {X)* in just the same time as that required to 
measure X on the original real network G*, plus a neg- 
ligible preliminary time required to f ind the parameter 
values 9* numerically through eq.(B8). The simple steps 



through which our method proceeds in the case of local 
constraints can be summarized as follows: 

1. choose the desired representation for the real net- 
work G* (directed/undirected, binary/weighted) 
and the corresponding grandcanonical ensemble of 
graphs {G}; 

2. specify the local constraints C(G) and use them to 
write the Hamiltonian H{G, 9) ~ 9 ■ C{G) and the 



probability P{G\9) 
eqs.(|A5l)-(|A7l); 



-H{G,9) i^^Q-^ according to 



3. rewrite the graph probability analytically in the 
factorized form P{G\9) = Wi^j Dij{gij,gji\9) ac- 



cording to eq.(Bl I 



4. use Dij{g, g'\d) to determine the basic qu a ntities 
{9ij)i {gfn) and {gijgji) according to eqs.(B7 1, ( B13 ) 
and ( |B15 1 respectively; 



5. numerically determine the m axim um- likelihood pa- 
rameters 9* by solving eq.(B8) or alternatively 
maximizing eq. ( B9 I ; 



6. use 9* to compute the ensemble average {X)* and 
standard deviation a* [X] of a ny des ired topological 
property X, according to eqs.(Al7) and (B16); 



7. assess whether the empirical value X{G*) is con- 
sistent with the randomized one {X)* using either 
the interval in eq. ( A28 ) or the 2-score in eq. ( A29 1 . 



For completeness, in the above list we have included all 
the logical steps involving also the initial derivation of 
the required analytical expressions. However, since those 
expressions have already been derived in the literature 
for all the constraints we will consider in what follows, 
in practice our method reduces to a straightforward ap- 
plication of the last three steps. For clarity, in what 
follows we illustrate the method explicitly for a range of 
useful specific cases, i.e. for various choices of the con- 
straints C and of the topological properties X. We will 
also highlight in more detail the advantages with respect 
to alternative methods. 



1. Undirected configuration model 

For unweighted undirected networks, each graph G in 
the ensemble is uniquely specified by its binary symmet- 
ric adjacency matrix A with entries a^- = aji = 1 if ver- 
tices i and j are connected, and aij = aji = otherwise. 
Generally, one considers loop-less graphs with an = 
unless otherwise specified. This fixes the first step of our 
method according to the list shown above. Thus we can 
replace G — >■ A and gij — > aij in our general notation 
used so far. 

Given a real binary undirected network A* with entries 
{a*j} and degree sequence fc(A*), our method allows to 
compare the properties of A* with those displayed by a 
randomized ensemble of binary undirected graphs hav- 
ing, on average, the same degree sequence as A*. As we 
mentioned in sec. [IT] the available methods have severe 
limitations. In particular, as noted in refs.[5] and |30) . the 
incorrectness of eq.Q is a consequence of the fact that 
it is not a proper maximum-entropy probability over the 
ensemble of binary graphs, i.e. it cannot be traced back 
to a Hamiltonian model as the ones described in A 1 By 



contrast, our method provides the correct solution. The 
appropriate choice is to include the constraint C — k into 
eq. ( A6 ) and obtain the corresponding correct probability 



[T3] . This is precisely what the steps 2-4 of our method 
prescribe. For the sake of completeness, we briefly sketch 
the main results. If C(A) — fc(A), the Hamiltonian reads 

i7(A, 9) = ^'^^(A) = + ^J>^J (^18) 

i i<j 

The partition function can be calculated exactly [13] as 



Z(0l-^e-^(^'«~) = J](l-fe- 



(B19) 



Therefore the graph probability can be written in the 
factorized form dBll) as follows 



P{A\9) = n D^j{a,j,a,S = U P^M^Jm (B20) 
where 



(B21) 



is the mass probability function of a Bernoulli-distributed 
binary random variable aij, with success probability 



1 



(B22) 



representing the probability that a link between i and 
j is present. Introducing the new variable Xi = e~^', 
not to be confused with the symbol X used so far, and 
changing notation from 9 to the expectation value of 



is simply given by 



{o-ij)x — Pij 



X 1 X J 



(B23) 
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Also, since af^ = 0^-, the second moment is 



^ij fx 



(B24) 



Finally, if (i, j) and (t, s) are two distinct pairs of vertices, 
then Gij and ats are independent random variables and 



{ciijats}s — {o-ij)x{o-ts)x 



(B25) 



This completes the fourth step in our method. 

The fifth step consists in finding the particular param- 
eter values X* that maximize eq.(B9), that in this case 
reads 

C{x) = lnP(A*|f) = ^h{A*)l^lx^ - "^Inil + XiXj) 



(B26) 

Equivalently |30' , the parameters x* can be found solving 
the following N cou pled equations enforcing the desired 
constraints as in eq. ( B8 1 : 



A:.(A*) 



(B27) 



Importantly, since Xi = e'^' and di is a real number, 
the solution we are looking for is the one where x* > 
This solution is unique. Even for large networks, the 
above parameter estimation ranges from seconds to tens 
of seconds even on an ordinary laptop. 

Once the parameters x* are found, we can proceed to 
the sixth step and exploit eq.(Al7) to obtain the expec- 



tation values of the properties X of interest: 

{Xy = ^X(A)P(A|x*) = X((A)*) + . . . (B28) 

A 

In particular, the expectation value of the ANND defined 
in eq.([3| is 



(ajk)' 



(B29) 



and the expectation value of the clustering coefficient de- 
fined in eq.Q is 



[aki 



(B30) 



where (aij)* = x*x*/{l -\-x*x*). Similarly, the s tandard 
deviation (t*[X] can be evaluated using eq.(B16), which 
here reads 



a*[X] = 



\ 



a 



dX 

'■^ da,; ■ 



(B31) 



V / A=(A)* 



where cr*[a,y] = ^(a.y)*(l - (a.y)*) = ^x*x)l(\ + 
x*x*^. It is straightforward to obtain (T*[^] in terms 
of X* alone, by using the derivation rule (B17): 



dat. 



(B32) 



This can also be implemented symbolically in adequate 
softwares. Let us calculate explicitly the standard devi- 
ations of the constraints: 



V 3^'^ V 



which in turn imply that 



(B34) 



Given the vertex i, if p*^ <^ 1, j = 1 . . . N and j 7^ i, 

the trend decreases as k~^^'^ (which also represents an 
upper-bound for the ratio). The more this condition is 
violated (the vertex i has an high degree, there are hubs 
in the network, etc.), the more important becomes the 
correction, lowering the ratio to eventually reach zero. 



2. Directed configuration model 

Binary directed networks have an asymmetric adja- 
cency matrix A with entries Uij = 1 if a directed link 
from i to j is there, and a.^ = otherwise. Given 
a real binary directed network A* with out-degree se- 
quence fc°"*(A*) and in-degree sequence A:*" (A*), our 
method provides analytical expressions for the expecta- 
tion values and standard deviations of topological prop- 
erties across the maximum-entropy ensemble of binary 
directed graphs with out-degree sequence fc°"*(A*) and 
in-degree sequence fc™(A*). The Hamiltonian is now 



H{A,d, P) 



[a,fcr*(A)+AA:r(A) 



The partition function can be calculated exactly [T3] as 

A i^^j 

The graph probability is now 



(B35) 



(B36) 



P(A|a,/3) = J]^ Aj(a»j,ai,:|a,/3) = J|Pij(a,j|d,^) 

(B37) 

where 



/^,(a.,|a,/3)=p°;^(l-p,,)(i-"-^) 



and 



Pij 



1 + e~"'-A- 



(B38) 



(B39) 



Setting Xi = e and y, 

{lij)x,y = Pij 



of aij is 



e the expectation value 



1 + Xiyj 



(B40) 
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The second moment is 



3. Weighted configuration model 



(B41) 

Finally, if (i, j) and {t, s) are two distinct pairs of vertices, 
now including the case (i, s) = then 



The log-likelihood ( |B9[ ) to maximize is 

jr{x, y)^Yl [^r"*(A*) Inx, + fcr(A*) \ny,] 



(B42) 



-^ln(l + x,yj) (B43) 

and the values x*, y* that realize the maximum can al- 
ternatively be found by solving the 2N coupled equations 



. 1 + i 



= fcr*(A*) 

= A;r(A*) 



(B44) 



(B45) 



corresponding to eq.(B8). Again, we are looking for the 



solution where x* > Q and y* > Vi. Expectation values 
can still be obtained using eq.(B28|. In particular, the 
directed ANNDs defined in eqs. ^gfand ^ have expec- 
tation values 



(B46) 



(B47) 



where {uij)* — x*y*/{l + x*y*). Similarly, the sta ndard 
devi ation a* \X] can still be evaluated through eqs.(B31) 
and ( 
us ca 
straints 



B32|, now using a*[a,j] = ^x\y*l{\ \- x*y*). Let 
culate explicitly the standard deviations of the con- 



V 



which in turn imply that 



J^out 




(fc°"*)2 



J2j^iiPji) 



(B48) 



(B49) 



(B50) 



(B51) 



Given the vertex i, if p*^ <C 1, j = 1 . . . A and j 7^ i, the 
trend decreases as (A:""*)^^/^ (which also represents an 
upper-bound for the ratio). The more this condition is 
violated (the vertex i has an high out-degree, there are 
in-dcgree hubs in the network, etc.), the more important 
becomes the correction, lowering the ratio to eventually 
reach zero. Similar observations hold for the in-degrees. 



When weighted undirected networks are considered, 
each graph G in the ensemble is specified by its non- 
negative symmetric matrix W whose integer entry Wij 
represents the weight of the link between vertices i and 
j (including Wij = if no link is there). Thus we can 
replace G — >■ W and gij — >■ Wij in the general nota- 
tion. As we mentioned in the main text, in the weighted 
configuration model a real weighted undirected network 
W* with entries is compared with a maximum- 

entropy ensemble of weighted undirected graphs having 
the same strength se quen ce s(W*). In our method, by 
setting C = s into eq. \Am wc obtain the Hamiltonian 



(w, e) = ^M^) = + Oj)t 



The partition function is [22] 



1 



(B52) 



(B53) 



W i<j 

and is only defined ii 6i > Q^i. The graph probability is 

m 



where 



p^,iw.^,\e)^p:;'i^''P^i) 



(B55) 



is the mass probability function of a geometrically- 
distributed |23| integer random variable Wij , with success 
probability 



Pi] 



(B56) 



representing the probability that a link between i and j is 
present. Introducing Xi = e~^' € [Oj1)j the expectation 



value of Wij is 



Pij X^Xj 



1 J^ij 1 XiXj 

Now in general wfj ^ Wij, and the second moment is 

/ 2\ _Pi]{l+Ptj) _ {XiXj)il+XiXj) 



(B57) 



(B58) 



Finally, if («, j) and {t, s) are two distinct pairs of vertices, 
then 

{WijWts)x = {Wij)x{wts)x (B59) 

The log-likelihood ( [B9| ) reads 

£{x) = \nP{W*\x) = ^Si(A*)lna;, + ^ln(l -a;,a;j) 



(B60) 
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and the parameters x* maximizing it solve the following 
N coupled equations 



(B61) 



enforcing the desired constraints as in eq.(B8). Now the 



solution must be looked for in the region < < 1 Vi. 

Through the parameters x* we obtain the expectation 
values of the properties X of interest: 



{xy = ^x(w)p(w|x*) = x{{wy 



(B62) 



w 



For instance, the expectation value of the weighted 
ANND defined in eq.(fl9l) is 



{Wjk) 



(B63) 



where we have used (W)* = (see main text). Simi- 
larly, the weighted clustering coefhcient defined in eq.( 20 ) 
has expectation value 



(B64) 



where 



x*x*/{l — x*x*). Similarly, according to 



eq.(B16) the standard deviation cr*[^] is 
a-[X] = 



\ 



, dX 

dwi 



(B65) 



/ W=(W) 



where (T*[i(;ij] = ^x*x* /(I — x*x*). The rule (B17) here 
reads 



dwt. 



SitSjs 



(B66) 



and allows to obtain a* [X] in terms of x* alone. Let us 
calculate explicitly the standard deviations of the con- 
straints: 



which in turn imply that 



(B67) 



(B68) 

Given the vertex i, if {wij}* <C 1, j = 1 . . . iV and j ^ i, 

— 1/2 

the trend decreases as . The more this condition 

is violated (the vertex i has an high strength, there are 
'strength-hubs' in the network, etc.), the more important 
becomes the correction. Note that for weighted networks 
the second term has a positive sign. This means that the 

— 1/2 

correction 'increases' the trend which now repre- 

sents a lower-bound for the coefficient of variation. 



Appendix C: NONLOCAL CONSTRAINTS 

Our model can also be applied to more complicated 
cases where the constraints are no longer local. However, 
a necessary condition for our method to work with non- 



local constraints is that eq.(A8) can still be expressed 



exactly in a form which does not require the enumera- 
tion of all possible graphs (in other words, the partition 
fun ction can be calculated analytically). In such a case, 
eq.(A10l can still be used to calculate the parameters 9* 



exactly as in the local case, and at the same time those 
parameters can be used to obtain the analytical expres- 
sions for the expected value and standard deviation of 
the topological properties of interest. Therefore, only a 
limited number of nonlocal constraints lend themselves 
to an analytical treatment. However, since the philos- 
ophy of randomization algorithms is always to enforce 
the simplest constraints in order to detect higher-order 
patterns, it turns out that the mathematically tractable 
constraints are also the ones of major interest. We now 
provide an explicit example of a choice of nonlocal con- 
straints that is often used in empirical studies, and at 
the same time preserves the analytical character of our 
method and yields exact results. 



1. Reciprocal configuration model 

As discussed in the main text, a more constrained null 
model for a binary directed network A* is one where the 
three reciprocal degree sequences fc~*'(A*), k^{A*) and 
fc^(A*) are specified, where 



fcr(A*)^^a*,(l-a^) 

fcr(A*) = E44 



(CI) 
(C2) 
(C3) 



The Hamiltonian for this model is 
H{A,dJ,j) = ^[a,fcr(A) + ftfcr(A) +7.fcr(A)] 

i 

The nonlocality is manifest in the fact that, unlike the 
previous examples, now the (second-order) constraints 
involve products of two adjacency matrix entries. De- 
spite this complication, the partition function can still 
be calculated exactly [H] as 



Z(a,/3, 7) =[](! + < 



+ e 



' -he" 



-7i-7j 



) (C4) 



The graph probability can still be expressed in the form 
pTl), i.e. 



P(A|a, /?, 7) - n A,(a^„ a,d<5, P, l) (C5) 
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{1 - a,j){l - aji) 



and 



In the above expression, 

A, K-,a,,|a,/3,7) = (p^T^ {ptjT^ {ptJT^ {p^jr^ 

is the dyadic probabihty defined in terms of 

= aij{l - aji) (C6) 

(C7) 
(C8) 
(C9) 

(CIO) 
(Cll) 
(C12) 

(C13) 

where we have set Xi = e~°'', fji = e~^' and Zi = e"'''' 
[22 j. The above expressions represent the dyadic expec- 
tation values. 

A httle algebra leads to the log-likelihood 

£{x, y,z) = - J2i<j ln(l + Xiyj + x^yi + z^zj) + 
E,; [fcr(A*)lna;, + fcr(A*)lny, + fcr(A*)lnz,] 



and the values x* , y* , z* that realize the maximum can 
alternatively [3D] found by solving the 3iV coupled equa- 
tions 



P^ = 




\ ^ 








Xiyj 




/x,y,z — 


1 - 


h Xiyj 


+ Xjyi - 


Z^Zj 


pt] = 












xjyi 




)s 


y,z — 


1 - 


h x^yj 


+ Xjyi - 


Zi Zj 


Fi] — 
















)s 


v,z — 


1 - 


h Xiyj 


+ Xj j/i - 


Z^Zj 


P^3 = 


{aTj 










1 




)s 


y,z = 


1 - 


h x^yj 


+ x^yi - 


Z^Zj 



Xi yj 

^ ^ + x*y* +x*y* + z*z* 



x^y. 



= kt{A*) Vz 



= kr{A*) vi 



triplet of dyadic relations defining motif m. The exact 
expectation value of Nm is 



m,2\ ^ I m.3\ 

^■fe ) Ki ) 



(C15) 



where (a™'^)* is given by evaluating eqs.(C10l-(C13) at 



devia 



ion 



the particular values af*, y^, z*. The standard 
of and in general of a top ological property X, can 
still be obtained using eq.(B16), i.e. 



dX 



(C16) 



CT a,;,, a 



/ A=(A)* 

dX dX 



* daij dajiy a=(a) 



where now 



(^*M)' = Kr(i-Kr) 



and 



a [fly, OjiJ = [aijaji) 
/„-<->\* 



which are both known exactly in terms of eqs.(ClO) 



(CIS). The calculations for the standard deviations of 
the constraints are similar to the directed configuration 
model case: 



(C17) 



which in turn imply that 



kf V (kfr 

(where a =— and similar observations hold 



(C18) 



corresponding to an example when eq.(AlO) can be writ- 
ten explicitly even if the constraints are nonlocal. We are 
looking for the solution where x* > 0, y* > and z* > 

V?:. 

The expectation values of topological properties in- 
volving products of dyadic terms can be obtained ex- 
actly without resorting to the linear approximation in 
eq.(A17|. For instance, the number of occurrences of a 



particular motif m, where m labels one of the possible 13 
non-isomorphic connected motifs with three vertices, is 



AT- \ ^ mA m.2 m,3 

N„, = a^j a^f^ a^i' 



(CM) 



where a"j'^ is one of the four possible dyadic relations a^, 
a'lj, a^, a*^, and {a™'\ aj^'^, a^'^} indicates the specific 



Appendix D: COMPARISON WITH 
COMPUTATIONAL MICROCANONICAL 
ALGORITHMS 

The LRA-based microcanonical approach [H [S] and 
our likelihood-based grandcanonical approach are in gen- 
eral not equivalent for finite networks. Let 2?(C) be the 
set of all graphs G that realize the enforced constraints 
C — {Ca} exactly. Both approaches assign equal prob- 
abilities to all graphs that realize the constraints, i.e. 
P(Gi) ^ P(G2) if Gi G V{C) and G2 £ V{C). Also, 
in both approaches these graphs are the most likely to 
occur, i.e. P(Gi) > P(G2) for any Gi G V{C) and 
G2 ^ T^{C). However the two approaches are differ- 
ent, the microcanonical one being very severe in assigning 
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vs 




FIG. 8. Difference between the LRA-based microcanonical 
approach and our hkehhood-based grandcanonical approach. 
Top: the microcanonical approach assigns non-zero probabil- 
ity only to the subset T>{C) of graphs that realize the enforced 
constraints C (in the example shown, a given value of the 
number of links L) exactly. Bottom: by contrast, our grand- 
canonical approach assigns non-zero probability to all graphs, 
but this probability reaches its maximum value for the graphs 
belonging to T>[C). In so doing, it is more robust to potential 
errors in the original network data (such as missing links). 



zero probability to any graph where the degrees are not 
matched exactly, i.e. P(G) = if G ^ T^{C)- By con- 
trast, in the grandcanonical approach all possible graphs 
can occur, even if with very different probabilities, in 
such a way that the ensemble average of the desired con- 
straints over all graphs coincides with the observed values 
(see fig|8]for an illustration of this difference). 

The above key and elegant property places grand- 
canonical ensembles at the basis of information theory. 
Notably, they are more robust to errors in the original 
data such as missing or overrepresented links. In presence 
of even a small percentage of such errors, the 'true' graph 
(the unobserved one affected by errors) will never appear 
in the microcanonical ensemble, while it will appear with 
nonzero probability in the grandcanonical ensemble. As 
desirable, for small deviations from the observed graph 
the true graph will have a slightly decreased probability 
with respect to the one assigned by our method to the 
observed graph, while for larger errors the probabilities 
will differ by a larger amount. 

Therefore, while for infinite systems the microcanon- 
ical and grandcanonical ensembles become equivalent 
since fluctuations about the average values become neg- 
ligible, in finite systems the use of grandcanonical en- 
sembles is preferable. What is of interest for us here is 
the impact of the two methods on the topological prop- 
erties induced on the randomized networks. To this end, 
we now show explicitly the relation between the two ap- 



proaches when applied to particular networks. We shall 
only consider unweighted networks for simplicity. 

In the unweighted (either directed or undirected) case, 
our method directly provides 'from the beginning' the 
explicit values of the probabilities pf^ that a link from i 
to j is there. The superscript G stands for 'grandcanon- 
ical', and the probability is evaluated at the parameter 
values that maximize the likelihood, as described above. 
By contrast, the microcanonical approach samples the 
configuration space iteratively, and the microcanonical 
probability pf- that a link from i to j is there can only 
be evaluated as the frequency of occurrence of the link 
over many randomizations. As the number of random- 
ized networks increases, this frequency will converge to 
pfj. However this asymptotic value will also depend on 
the number R of elementary rewiring steps used to ob- 
tain a single randomized network. To see this, consider 
the trivial case R — 0. As no rewiring takes place, all 
the 'randomized' networks will in fact coincide with the 
original network. If the adjacency matrix of the latter 
has elements {oij}, this means that pf^ = aij. If R is 
nonzero but still very small, pf^ will not change sub- 
stantially. Only if R is large enough then p^ will ap- 
proach pfy This is shown explicitly in figjoj where we 
plot pfj as a function of pf^ for all directed pairs of ver- 
tices (i,j) by taking the Little Rock Lake food web as 
the starting network. As R increases from i? = to 
R — 10000, the double-peaked shape (corresponding to 
pfj = aij independently oipf^) evolves towards the iden- 
tity pfj — pfj. Similar evolution patterns are observed 
for all the networks we analyzed. This clearly shows that 
in our method we obtain 'from the beginning' the values 
pfj to which the microcanonical pfj will converge only af- 
ter several iterations. Notably, the number R of rewiring 
steps required for pfj to converge to pf, acceptably is not 
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FIG. 9. Convergence of the microcanonical connection proba- 
bility pfj (measured using the local rewiring algorithm) to the 
grandcanonical probability pfj (obtained using our maximum- 
entropy method) as the number R of local rewiring moves per 
network increases. 
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known a priori and without the knowledge of pf^ itself. 
This problematic aspect of the microcanonical approach 
highlights another advantage of the grandcanonical one. 

The two approaches are in general not equivalent for 
finite networks. We can now state this more rigorously, 
and indicate at least two ways in which they may differ. 

First of all, pf^ represent marginal probabilities, where 
the information about the correlations between the pres- 
ence of a link between different pairs of vertices has been 
lost. While in the grandcanonical approach these correla- 
tions are absent, and different pairs of vertices are always 
statistically independent, in the microcanonical approach 
some weak correlations will be preserved even after many 
rewiring steps. These correlations arise from the micro- 
canonical constraint of matching the degree sequence (or 
other contraints) exactly. Thus, while our grandcanoni- 
cal method enables to compute the expected topological 
properties exactly, in the microcanonical approach this 
is not possible. 

Secondly, the final 'convergence' of pf- 
R ^ oo will in general not hold exactly. This means 
that the asymptotic plot of pfj versus pfj will not be a 
strict identity, but a narrow scatter of points close to the 
identity. In other words, increasing R beyond a certain 
value will not make the quantities converge further. For 
some networks (such as the Little Rock Lake food web 
shown above) one may attain a better convergence than 
for others. 

It is interesting to understand whether the degree of 
convergence between the two approaches depends on 
some property of the network. To this end, we first define 
two measures of discrepancy between {pfj} and {pfj}, 
and then study how they behave on well-controlled, arti- 
ficially generated networks. As measures of discrepancy, 
we consider the P distance 



to pfj for 



Xi drawn randomly in the interval [0, 1], and established 
an edge between each pair of vertices i and j with prob- 
ability Pij = ZXiXj/{l + ZXiXj). 

This choice generates maximally random networks 
with degree distribution controlled by {xi\ as in 
eq.(B23), but has an additional parameter z that tunes 
the overall link density d = 2L/N(N — 1), representing 
the fraction of realized links. With {xi} kept constant, 
we considered various choices of z and, for each of them, 
adopted both the microcanonical randomization and our 
grandcanonical method. 

In fig[TO]we show the resulting difference between the 
marginal probabilities {pfj} and {pfJ}, as a function of 
link density. The two methods yield very similar results 
for both small and large link density, whereas for inter- 
mediate density values they display a greater difference. 
Even in this case, however, the distances between them 
are A/2 w 0.05 and Akl ~ 0.12, both small considering 
their possible range of variation. 
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and the KuUback-Leibler information distance 



(Dl) 



(D2) 



N{N-1) 
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(note that we have normalized the above distances in 
such a way that both he in the range [0, 1]). It is instruc- 
tive to use these distances to compare the two methods 
on a family of artificially generated networks. We consid- 
ered TV = 100 vertices, assigned each vertex a real value 



FIG. 10. KuUback-Leibler {Akl, green squares) and / 
(A;2, blue circles) distance between microcanonical (pfj ) and 
grandcanonical {pfj) marginal connection probabilities, plot- 
ted versus link density d. 
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